Insights into Machine Learning: Data Clustering and Classification Algorithms for Astrophysical Experiments

نویسندگان

  • ALESSANDRO DE ANGELIS
  • PRAVEEN BOINEE
چکیده

Data analysis domain dealing with data exploration, clustering and classification is an important problem in many experiments of astrophysics, computer vision, bioinformatics etc. The field of machine learning is increasingly becoming popular for performing these tasks. In this thesis we deal with machine learning models based on unsupervised and supervised learning algorithms. In unsupervised learning category, we deal with Self-Organizing Map (SOM) with new kernel function. The data visualization/exploration and clustering capabilities of SOM are experimented with real world data set problems for finding groups in data (cluster discovery) and visualisation of these clusters. Next we discuss ensembling learning, a specialized technique within the supervised learning field. Ensemble learning algorithms such as AdaBoost and Bagging have been in active research and shown improvements in classification results for several benchmarking data sets. They grow multiple learner algorithms and combine them for getting better accuracy results. Generally decision trees learning algorithm is used as base classifiers to grow these ensembles. In this thesis we experiment with Random Forests (RF) and Back-Propagation Neural Networks (BPNN) as base classifiers for making ensembles. Random Forests is a recent development in tree based classifiers and quickly proven to be one of the most important algorithms in the machine learning literature. It has shown robust and improved results of classifications on standard data sets. We experiment the working of the ensembles of random forests on the standard data sets available in University of California Irvine (UCI) data base. We compare the original random forest algorithm with their ensemble counterparts and discuss the results. Finally we deal the problem of image data classification with both supervised (ensemble) learning and unsupervised learning. We apply the algorithms developed in the thesis for this task. These image data are taken from the MAGIC telescope experiment, which collects the images of particle rays coming from the outer universe. We apply the ensembles of RF, BPNN for making a supervised classification of images and compare the performance results. Then we discuss a SOM system, developed for making an automatic classification of images using the unsupervised techniques.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An improved opposition-based Crow Search Algorithm for Data Clustering

Data clustering is an ideal way of working with a huge amount of data and looking for a structure in the dataset. In other words, clustering is the classification of the same data; the similarity among the data in a cluster is maximum and the similarity among the data in the different clusters is minimal. The innovation of this paper is a clustering method based on the Crow Search Algorithm (CS...

متن کامل

Image Classification via Sparse Representation and Subspace Alignment

Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...

متن کامل

Diagnosis of Heart Disease Based on Meta Heuristic Algorithms and Clustering Methods

Data analysis in cardiovascular diseases is difficult due to large massive of information. All of features are not impressive in the final results. So it is very important to identify more effective features. In this study, the method of feature selection with binary cuckoo optimization algorithm is implemented to reduce property. According to the results, the most appropriate classification fo...

متن کامل

Classification of encrypted traffic for applications based on statistical features

Traffic classification plays an important role in many aspects of network management such as identifying type of the transferred data, detection of malware applications, applying policies to restrict network accesses and so on. Basic methods in this field were using some obvious traffic features like port number and protocol type to classify the traffic type. However, recent changes in applicat...

متن کامل

یادگیری نیمه نظارتی کرنل مرکب با استفاده از تکنیک‌های یادگیری معیار فاصله

Distance metric has a key role in many machine learning and computer vision algorithms so that choosing an appropriate distance metric has a direct effect on the performance of such algorithms. Recently, distance metric learning using labeled data or other available supervisory information has become a very active research area in machine learning applications. Studies in this area have shown t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006